理论知识
很多变量之间的关系是非线性的, 因此多元线性回归只能被看作非线性经济关系的一种一介近似。但是二阶甚至更高阶的函数关系也很重要, 那么当昨晚多元线性回归后, 我们可以检验是否存在多阶的函数关系, 具体来说就是我们可以做Ramsey’s RESET检验和连接检验。
考虑以下回归方程:
$$ y = x'\beta + \xi $$
回归后的拟合值:
$$ \hat y = x'b $$
RESET检验就是构建以下回归方程, 并对原假设($H0: \delta_2=\delta_3=\delta_4=0$
RESET检验的另一种形式是使用解释变量的幂作为非线性项。
另一种模型设定检验方法是连接检验(link test), 它的回归方程是:
$$ y = \delta_0 + \delta_1 \hat y + \delta_2 \hat y^2 + e $$
stata实践
以data/nerlove.dta
数据为例, 下面先加载数据:
1 | use data/nerlove.dta, clear |
(Nerlove 1963 paper)
看一下数据的基本情况:
1 | des |
Contains data from data/nerlove.dta obs: 145 Nerlove 1963 paper vars: 10 13 Aug 2012 10:00 size: 5,220 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tc float %9.0g total cost q int %8.0g total output pl float %9.0g price of labor pf float %9.0g price of fuel pk int %8.0g user cost of capital lntc float %9.0g lnq float %9.0g lnpf float %9.0g lnpk float %9.0g lnpl float %9.0g --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sorted by:
首先进行多元线性回归:
1 | reg lntc lnq lnpl lnpk lnpf |
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(4, 140) = 437.90 Model | 269.524728 4 67.3811819 Prob > F = 0.0000 Residual | 21.5420958 140 .153872113 R-squared = 0.9260 -------------+---------------------------------- Adj R-squared = 0.9239 Total | 291.066823 144 2.02129738 Root MSE = .39227 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnq | .7209135 .0174337 41.35 0.000 .6864462 .7553808 lnpl | .4559645 .299802 1.52 0.131 -.1367602 1.048689 lnpk | -.2151476 .3398295 -0.63 0.528 -.8870089 .4567136 lnpf | .4258137 .1003218 4.24 0.000 .2274721 .6241554 _cons | -3.566513 1.779383 -2.00 0.047 -7.084448 -.0485779 ------------------------------------------------------------------------------
进行连接检验:
1 | linktest |
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(2, 142) = 1460.70 Model | 277.574775 2 138.787388 Prob > F = 0.0000 Residual | 13.4920481 142 .095014423 R-squared = 0.9536 -------------+---------------------------------- Adj R-squared = 0.9530 Total | 291.066823 144 2.02129738 Root MSE = .30824 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | .791953 .0293837 26.95 0.000 .733867 .8500389 _hatsq | .0941454 .0102281 9.20 0.000 .0739264 .1143643 _cons | -.0962174 .0425807 -2.26 0.025 -.1803914 -.0120434 ------------------------------------------------------------------------------
我们可以看到, 二次项的系数显著, 可以拒绝原假设, 说明模型存在着设定误差, 因此需要考虑假如多次项, 下面进行RESET检验:
1 | estat ovtest |
Ramsey RESET test using powers of the fitted values of lntc Ho: model has no omitted variables F(3, 137) = 32.72 Prob > F = 0.0000
F检验的p值显著, 说明存在设定误差, 下面使用解释变量的幂次项:
1 | estat ovtest, rhs |
Ramsey RESET test using powers of the independent variables Ho: model has no omitted variables F(12, 128) = 8.96 Prob > F = 0.0000
结果同样显著。
因此我们考虑纳入解释变量lnq的二次项:
1 | gen lnq2 = lnq^2 |
1 | reg lntc lnq lnpl lnpk lnpf lnq2 |
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(5, 139) = 622.86 Model | 278.630831 5 55.7261661 Prob > F = 0.0000 Residual | 12.4359927 139 .089467573 R-squared = 0.9573 -------------+---------------------------------- Adj R-squared = 0.9557 Total | 291.066823 144 2.02129738 Root MSE = .29911 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnq | .1166562 .0613522 1.90 0.059 -.004648 .2379605 lnpl | .0206146 .2326431 0.09 0.930 -.4393621 .4805913 lnpk | -.568725 .2614871 -2.17 0.031 -1.085732 -.0517185 lnpf | .4804816 .0766894 6.27 0.000 .3288531 .6321101 lnq2 | .0536124 .0053141 10.09 0.000 .0431055 .0641194 _cons | -.1627064 1.398139 -0.12 0.908 -2.927075 2.601662 ------------------------------------------------------------------------------
从上面的结果中可以看出, lnq2的系数是显著的, 说明这个变量的确影响了被解释变量。
下面再次进行连接检验:
1 | linktest |
Source | SS df MS Number of obs = 145 -------------+---------------------------------- F(2, 142) = 1591.85 Model | 278.638903 2 139.319451 Prob > F = 0.0000 Residual | 12.4279206 142 .087520568 R-squared = 0.9573 -------------+---------------------------------- Adj R-squared = 0.9567 Total | 291.066823 144 2.02129738 Root MSE = .29584 ------------------------------------------------------------------------------ lntc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat | 1.009721 .0365875 27.60 0.000 .9373943 1.082047 _hatsq | -.0031437 .0103516 -0.30 0.762 -.0236068 .0173193 _cons | -.0013733 .0394759 -0.03 0.972 -.0794096 .0766631 ------------------------------------------------------------------------------
二次项的系数已经不显著了, 再次进行RESET检验:
1 | estat ovtest |
Ramsey RESET test using powers of the fitted values of lntc Ho: model has no omitted variables F(3, 136) = 1.19 Prob > F = 0.3165
RESET检验在此说明, 函数设定误差基本被消除。
注意
本文由jupyter notebook转换而来, 您可以在这里下载notebook
统计咨询请加QQ 2726725926, 微信 mllncn, SPSS统计咨询是收费的
微博上@mlln-cn可以向我免费题问
请记住我的网址: mlln.cn 或者 jupyter.cn